A Multiple Feature based Novel Approach for Identification of Printed Indian Scripts at Word Level
نویسندگان
چکیده
In a country like India where different scripts are in use, automatic identification of printed script facilitates many important applications such as automatic transcription of multilingual documents and for the selection of script specific OCR in a multilingual environment. In this paper a novel method to identify the script type of the collection of documents printed in seven Indian languages at word level is proposed. These languages are Bangla, Hindi, English, Malayalam, Oriya, Tamil and Kannada. The recognition is based upon multiple features extracted using Discrete Cosine Transform (DCT) and Discrete Wavelet Transform (DWT). Script classification performance is analyzed using the Knearest neighbor classifier by comparing the majority of voting’s between the outputs of DCT and DWT based methods. The proposed scheme utilizes the strength of both the DCT and DWT based features. The results of experimentation found the overall accuracy to be 98.11 % which show the superiority of the proposed multiple features based scheme over several existing schemes of script identification. General Terms Script Identification, Image Preprocessing, OCR.
منابع مشابه
Handwritten Script Identification from a Bi-Script Document at Line Level using Gabor Filters
In a country like India where more number of scripts are in use, automatic identification of printed and handwritten script facilitates many important applications including sorting of document images and searching online archives of document images. In this paper, a Gabor feature based approach is presented to identify different Indian scripts from handwritten document images. Eight popular In...
متن کاملHandwritten Script Recognition Using DCT, Gabor Filter and Wavelet Features at Line Level
In a country like India where more number of scripts are in use, automatic identification of printed and handwritten script facilitates many important applications including sorting of document images and searching online archives of document images. In this paper, a multiple feature based approach is presented to identify the script type of the collection of handwritten documents. Eight popula...
متن کاملStatistical Texture Features based Handwritten and Printed Text Classification in South Indian Documents
In this paper, we use statistical texture features for handwritten and printed text classification. We primarily aim for word level classification in south Indian scripts. Words are first extracted from the scanned document. For each extracted word, statistical texture features are computed such as mean, standard deviation, smoothness, moment, uniformity, entropy and local range including local...
متن کاملGlobal Approach for Script Identification using Wavelet Packet Based Features
In a multi script environment, an archive of documents having the text regions printed in different scripts is in practice. For automatic processing of such documents through Optical Character Recognition (OCR), it is necessary to identify different script regions of the document. In this paper, a novel texture-based approach is presented to identify the script type of the collection of documen...
متن کاملEntropy Based Texture Features Useful for Automatic Script Identification
In a multi script environment, a collection of documents printed in different scripts is in practice. For automatic processing of such documents through Optical Character Recognition, it is necessary to identify the script type of the document. In this paper, a novel texture-based approach is presented to identify the script type of the documents printed in three prioritized scripts Kannada, Hi...
متن کاملذخیره در منابع من
با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید
عنوان ژورنال:
دوره شماره
صفحات -
تاریخ انتشار 2014